Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectors-ci: make source-file testable in airbyte-ci #27107

Merged
merged 15 commits into from
Jun 8, 2023

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Jun 7, 2023

What

Relates to #25053

Fixing source-file

source-file integration tests uses docker-compose to spin up a sftp container used for testing.
To make the container orchestration and connection work in airbyte-ci we should bind python integration test to the global docker host.
The docker host should also have access to the file we want to mount of sftp, to do so we move them to the /tmp folder so that both the container under test, docker host have access to it and the docker host can mount them to the sftp containers.

Fixing source-file-secure

source-file-secure has an implicit dependency to source-file: its docker image is based on source-file docker image in which the source-file package was installed.
But, outside of the dockerized execution, executing import source_file in python in source-file-secure fails because our pipeline did not handle installing other connector package.
To fix this problem:

  • Dependency to source-file in source-file-secure is declared explicitely inside its setup.py (@evantahler we previously chatted about the benefits of explicitely variants dependencies in setup.py)
  • We detect dependencies to mount in the dagger pipeline by running python setup.py egg_ingo
  • We mount the required dependencies when installing the source-file-secure in the test environment.
  • When building the source-file-secure docker image we mount and install the same dependencies.

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/file labels Jun 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan and you've followed all steps in the Breaking Changes Checklist
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • You, or an Airbyter, have run /test successfully on this PR - or on a non-forked branch
  • You've updated the connector's metadata.yaml file (new!)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@alafanechere alafanechere changed the base branch from master to 10730-redshiftJDBParameter June 7, 2023 16:26
@alafanechere alafanechere requested review from a team as code owners June 7, 2023 16:26
@alafanechere alafanechere changed the base branch from 10730-redshiftJDBParameter to master June 7, 2023 16:26
@alafanechere alafanechere changed the base branch from master to 16658-source-twilio-config June 7, 2023 16:29
@alafanechere alafanechere changed the base branch from 16658-source-twilio-config to master June 7, 2023 16:29
Comment on lines +1 to +8
### WARNING ###
# This Dockerfile will soon be deprecated.
# It is not used to build the connector image we publish to DockerHub.
# The new logic to build the connector image is declared with Dagger here:
# https://github.com/airbytehq/airbyte/blob/master/tools/ci_connector_ops/ci_connector_ops/pipelines/actions/environments.py#L771

# If you need to add a custom logic to build your connector image, you can do it by adding a finalize_build.sh or finalize_build.py script in the connector folder.
# Please reach out to the Connectors Operations team if you have any question.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source-file-secure is now fully built from Dagger. As source-file needs to be mounted at build time it was easier to accomplish it with Dagger.

@@ -1,3 +1,2 @@
-e ../../bases/connector-acceptance-test
-e ../source-file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source-file dependency is now explicitly defined in setup.py.


from setuptools import find_packages, setup


def local_dependency(name: str) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compute the absolute path to the local dependencies. In dagger pipelines we mount local dependencies to /local_dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear python!

I like this solution.

My only reservations are having

  1. Dagger relying on file:// to mark relative dependencies, or pip -e as this isnt always the case (example)
  2. our connector setup.py's relying on knowing how dagger works under the hood.

So I want to ask a question for #1 and propose a possible idea for 2

  1. I dont think we can do away with this, but how come we dont mount the relative import to the same relative location in the dagger file tree?

  2. What if, in the dagger machine, we symlinked the local host path to the dagger /local_dependencies version? Would remove the need for having DAGGER_BUILD in our connectors folder.

Comment on lines -15 to -24
try:
import source_file.source
except ModuleNotFoundError:
current_dir = os.path.dirname(os.path.abspath(__file__))
parent_source_local = os.path.join(current_dir, "../../source-file")
if os.path.isdir(parent_source_local):
sys.path.append(parent_source_local)
else:
raise RuntimeError("not found parent source folder")
import source_file.source
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required anymore as the source file connector is installed as a normal python deps with pip install.

@alafanechere alafanechere requested review from evantahler, bnchrch and a team June 7, 2023 20:23
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

Affected Connector Report

The latest commit has removed all connector-related changes. There are no more dependent connectors for this PR.

@alafanechere
Copy link
Contributor Author

alafanechere commented Jun 7, 2023

/test connector=connectors/source-file

🕑 connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/5205532222
✅ connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/5205532222
Python tests coverage:

Name                      Stmts   Miss  Cover
---------------------------------------------
source_file/__init__.py       2      0   100%
source_file/utils.py         13      1    92%
source_file/source.py        83      7    92%
source_file/client.py       335     56    83%
---------------------------------------------
TOTAL                       433     64    85%
Name                      Stmts   Miss  Cover
---------------------------------------------
source_file/__init__.py       2      0   100%
source_file/client.py       335     56    83%
source_file/utils.py         13      8    38%
source_file/source.py        83     62    25%
---------------------------------------------
TOTAL                       433    126    71%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: Incremental syncs are not supported on this connector.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:598: The previous and actual discovered catalogs are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:695: This tests currently leads to too much failures. We need to fix the connectors at scale first.
======================== 35 passed, 4 skipped in 53.89s ========================

@alafanechere
Copy link
Contributor Author

alafanechere commented Jun 7, 2023

/test connector=connectors/source-file-secure

🕑 connectors/source-file-secure https://github.com/airbytehq/airbyte/actions/runs/5205532489
❌ connectors/source-file-secure https://github.com/airbytehq/airbyte/actions/runs/5205532489
🐛 https://gradle.com/s/sys35fyaxoyew

Build Failed

Test summary info:

Could not find result summary

@evantahler
Copy link
Contributor

evantahler commented Jun 8, 2023

While out of scope for this PR, I have the opinion that using local docker containers for testing connectors is bad for a number of reasons:

  • CI anti-patterns like this
  • slow to spin up and down containers for each test/suite
  • false sense of isolation. we are in general poor at clearning up files/dbs after tests. This is a problem masked by destroying the container which appears when moving to a hosted DB, like testing snowflake
  • unobservable. If a test fails, it is hard to connect to the DB or S3 bucked to see what went wrong

To that end, I would propose that we work to remove all test containers like this, and set up persistent test hosts for all connectors

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me, mainly about explicit dependencies!
I'd wait for the other reviews on the 🐍 code before merging


# If you need to add a custom logic to build your connector image, you can do it by adding a finalize_build.sh or finalize_build.py script in the connector folder.
# Please reach out to the Connectors Operations team if you have any question.
FROM airbyte/source-file:0.3.10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@@ -21,8 +32,10 @@
"xlrd==2.0.1",
"openpyxl==3.0.10",
"pyxlsb==1.0.9",
local_dependency("source-file"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -24,7 +24,7 @@
"pyxlsb==1.0.9",
]

TEST_REQUIREMENTS = ["pytest~=6.2", "pytest-docker~=1.0.0", "pytest-mock~=3.6.1"]
TEST_REQUIREMENTS = ["pytest~=6.2", "pytest-docker~=1.0.0", "pytest-mock~=3.6.1", "docker-compose"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a python package to work with docker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source-file integration test use docker-compose without specifying it as a test dependency. I added this so that our pipeline installs it within the test environment, that does not have docker-compose by default.

Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here @alafanechere

This must have been annoying to work through

I had one alternative idea, but Im not positive it would work and in the spirit of "done not perfect"


from setuptools import find_packages, setup


def local_dependency(name: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear python!

I like this solution.

My only reservations are having

  1. Dagger relying on file:// to mark relative dependencies, or pip -e as this isnt always the case (example)
  2. our connector setup.py's relying on knowing how dagger works under the hood.

So I want to ask a question for #1 and propose a possible idea for 2

  1. I dont think we can do away with this, but how come we dont mount the relative import to the same relative location in the dagger file tree?

  2. What if, in the dagger machine, we symlinked the local host path to the dagger /local_dependencies version? Would remove the need for having DAGGER_BUILD in our connectors folder.

if await get_file_contents(container, "setup.py"):
container_with_egg_info = container.with_exec(["python", "setup.py", "egg_info"])
egg_info_output = await container_with_egg_info.stdout()
for line in egg_info_output.split("\n"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must ahve been painful, well done


if await get_file_contents(container, "setup.py"):
container = container.with_exec(install_connector_package_cmd)
if await get_file_contents(container, "requirements.txt"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We likely will need on for pyproject.toml at some point.

@alafanechere
Copy link
Contributor Author

My only reservations are having

  • Dagger relying on file:// to mark relative dependencies, or pip -e as this isnt always the case (example)
  • our connector setup.py's relying on knowing how dagger works under the hood.

I believe there's indeed a cleaner solution I'd like to tackle later:
Have a a single function that buids a python connector container by:

  • Detecting its local dependencies from pyproject.toml, requirements.txt or setup.py
  • Mounting these dependencies + the connector code with path relative to the airbyte repo
  • Install the dependencies according to the context (eg - install tests deps if it's the container to run tests, or install a specific cdk version if the code change has cdk change etc.)

Then we'll have a same base container on top of which we can:

  • run unit/integration tests
  • build for CAT and publish

I did not take this approach right now because it was too much refacto, and as we still rely on dockerfiles for python connectors (except source-file-secure) we still have a different container for unit/integration tests (dynamically built with dagger) and one for build and publish based on the Dockerfile.

This is definitely something we'll tackle when we'll want to remove dockerfiles for Python connectors.

@alafanechere
Copy link
Contributor Author

alafanechere commented Jun 8, 2023

/test connector=connectors/source-file-secure

🕑 connectors/source-file-secure https://github.com/airbytehq/airbyte/actions/runs/5211825882
❌ connectors/source-file-secure https://github.com/airbytehq/airbyte/actions/runs/5211825882
🐛 https://gradle.com/s/vggfbmqvgvntm

Build Failed

Test summary info:

Could not find result summary

@alafanechere
Copy link
Contributor Author

alafanechere commented Jun 8, 2023

/test connector=connectors/source-file-secure

🕑 connectors/source-file-secure https://github.com/airbytehq/airbyte/actions/runs/5212111191
✅ connectors/source-file-secure https://github.com/airbytehq/airbyte/actions/runs/5212111191
Python tests coverage:

Name                             Stmts   Miss  Cover
----------------------------------------------------
source_file_secure/__init__.py       2      0   100%
source_file_secure/source.py        33      3    91%
----------------------------------------------------
TOTAL                               35      3    91%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestFullRefresh.test_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:598: The previous and actual discovered catalogs are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:695: This tests currently leads to too much failures. We need to fix the connectors at scale first.
================= 35 passed, 5 skipped, 39 warnings in 35.11s ==================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/file connectors/source/file-secure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants